Uploading an csv file into HF

aniket2025 · May 30, 2025, 6:03am

I have a csv file that contains two columns:

Column-1: Titles (Image description/caption)
Column-2: Images (Image URL)

Assume that in each row there is only one image URL and its corresponding description (however for my case one row may have multiple image URLs, I have to manually clean them).

Now, using this dataset, I want to fine tune the Salesforce BLIP model (base or BLIP-2). In order to do that, I have found a Google Colab notebook that uses a Football data for training.

There I can see a code that goes like load_dataset(repository/data, split = ‘train’). Now, I have some query regarding this.

For uploading my data into HF repo, should my data contain image URLs, or they must be raw images?
For doing the split = ‘train’, should i divide my complete data into train.csv and test.csv and then upload it in HF? Or there is some more codes to write or the load_dataset function automatically divides the dataset into train-test split as mentioned by split = ‘train’?

I am unable to find any proper documentation for this process and would be very helpful if someone helps me to resolve this issue.

Google Colab Notebook for Reference

John6666 · May 30, 2025, 6:55am

I think it’s better to download the data first, create a dataset that includes the actual images, and then upload it, as this reduces the risk of encountering download errors during training. However, either method should work. Additionally, there is an option to create a script for loading the dataset.

Ultimately, you need to decide whether to have the Trainer’s DataCollator download the data from the URL or to prepare the dataset in advance and use the datasets library.

Topic		Replies	Views
Loading simple csv data for time series transformer Beginners	1	989	October 30, 2023
I had collected data for a language text for translation How can I add it up into datsets 🤗Datasets	7	1571	August 23, 2021
Proper way of preparing dataset with images 🤗Datasets	0	67	July 31, 2024
Convert .csv into dataset.Dataset Beginners	2	6946	March 20, 2022
How do I make a dataset for vision models? 🤗Datasets	12	1498	April 20, 2024

Uploading an csv file into HF

Related topics